sql/opt: decompile plan gists into PlanGrams#152988
Conversation
|
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
98a9661 to
3fdd37f
Compare
747d362 to
d357f00
Compare
|
@michae2 @ZhouXing19 friendly ping :) |
michae2
left a comment
There was a problem hiding this comment.
Noticed some things to fix, but overall this
Nice work!! It's exciting to see this come together!
@michae2 reviewed 14 files and all commit messages, and made 14 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on DrewKimball and ZhouXing19).
-- commits line 8 at r2:
Nice, this is a great idea!
pkg/sql/opt/exec/explain/decompile.go line 39 at r4 (raw file):
err = errors.NewAssertionErrorWithWrappedErrf(err, "decompile produced invalid PlanGram") if buildutil.CrdbTestBuild { panic(err)
not sure about this
pkg/sql/opt/exec/explain/testdata/plangram line 46 at r4 (raw file):
root: n0; n0: (Scan Table="foo" Index="foo_pkey")
I think it's going to be important to support HasConstraint and HasLimit (in a future PR) to distinguish between a full scan and a constrained scan.
pkg/sql/opt/exec/explain/testdata/plangram line 426 at r4 (raw file):
(Project n3) | n3; n3: (ZigzagJoin);
Looks like we're missing table and index fields for ZZJ.
pkg/sql/opt/exec/explain/testdata/plangram line 458 at r4 (raw file):
│ size: 1 column, 1 row │ └── • subquery
Interesting that the gist has the subquery, but doesn't have the scalar expression in the values referencing the subquery... just barely not enough information to reconstruct the opt plan.
pkg/sql/opt/exec/explain/testdata/plangram line 521 at r4 (raw file):
• explain plangram: root: (Explain);
We should add Explain to the transparent ops. (And any other operators that return false in plangram.VisibleToPlanGram... oh, actually Explain should be the only one that survives execbuild.)
Hmm, maybe we need some kind of assertion that VisibleToPlanGram is true for every operator added to a PlanGram during building.
pkg/sql/opt/exec/explain/testdata/plangram line 675 at r4 (raw file):
| (PlaceholderScan Table="abc" Index="abc_pkey"); # createViewOp
typo: extra character here
pkg/sql/opt/exec/explain/testdata/plangram line 804 at r4 (raw file):
root: (Sort (WithScan));
I think there should be a (With) parent wrapping the (WithScan)? We might need to add some special handling to the decompiler to follow buffer labels. (Could be a future PR.)
pkg/sql/opt/props/physical/plangram.go line 299 at r2 (raw file):
// the rules go on their own indented lines (with `|` prefixing alts); // otherwise the production stays on a single line. writeProduction := func(name string, rules []planGramTerm) {
bikeshedding: could we call it formatProduction?
pkg/sql/opt/props/physical/plangram.go line 374 at r3 (raw file):
// // A PlanGramBuilder is not safe for concurrent use. type PlanGramBuilder struct {
This turned out really nice!
pkg/sql/opt/props/physical/plangram.go line 389 at r3 (raw file):
// // Production names must not be empty, must not start with a digit, must not // contain `"'\,():;=|` or whitespace, and must not be "any" or "none". Calling
nit: This sentence is already contained in the comment for planGramProduction.name, let's just have it in one place.
pkg/sql/opt/props/physical/plangram.go line 458 at r3 (raw file):
// AddField adds a field constraint to the current expression. Must precede // any child added via EnterExpr or Ref*. func (b *PlanGramBuilder) AddField(field PlanGramField) error {
We should also add a validateFieldKey that checks the key for punctuation and whitespace.
Hmm, actually, it could probably be the same as validateNonterminalName.
pkg/sql/opt/props/physical/plangram.go line 608 at r3 (raw file):
if strings.ContainsAny(name, `"'\,():;=|`) { return errors.Newf("name %q contains invalid character", name) }
I just realized: it would also be good to check for whitespace here (maybe using unicode.IsSpace or something).
michae2
left a comment
There was a problem hiding this comment.
@michae2 made 1 comment and resolved 1 discussion.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on DrewKimball and ZhouXing19).
pkg/sql/opt/exec/explain/decompile.go line 39 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
not sure about this
Eh, ignore me.
When called with newlines=true, FormatPretty now also indents nested parenthesized expressions to their paren depth, and puts each rule of a multi-rule production on its own indented line with a leading `|` for alternates. The single-line form (newlines=false, used by Format and String) is unchanged. Single-rule productions whose root expression has no nested expressions still print on one line — e.g. `root: (Scan);` or `n0: (InnerJoin n1 n2);` — so trivial cases stay compact. Epic: none Release note: None
Add PlanGramBuilder, a stateful builder for constructing PlanGrams, modeled on explain.OutputBuilder. Enter*/Leave* push and pop frames; AddField and Ref* methods apply to the current frame. Forward references and cycles are resolved at Build; the zero value is ready to use; Reset reuses the underlying map and slice memory. ParsePlanGram is rewired to drive the same builder so construction invariants are validated in one place. Epic: none Release note: None
DrewKimball
left a comment
There was a problem hiding this comment.
TFTR!
@DrewKimball made 14 comments and resolved 5 discussions.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on michae2 and ZhouXing19).
Previously, michae2 (Michael Erickson) wrote…
Nice, this is a great idea!
Thanks!
pkg/sql/opt/exec/explain/decompile.go line 39 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Eh, ignore me.
Eh, I agree that it's questionable. I'll remove this wrapping+panic (and elsewhere), and we can let the caller decide how concerning the error is.
pkg/sql/opt/props/physical/plangram.go line 299 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
bikeshedding: could we call it
formatProduction?
Done.
pkg/sql/opt/props/physical/plangram.go line 374 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
This turned out really nice!
Thanks!
pkg/sql/opt/props/physical/plangram.go line 389 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
nit: This sentence is already contained in the comment for
planGramProduction.name, let's just have it in one place.
Done.
pkg/sql/opt/props/physical/plangram.go line 458 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
We should also add a
validateFieldKeythat checks the key for punctuation and whitespace.Hmm, actually, it could probably be the same as
validateNonterminalName.
Done. I added a separate validateFieldKey that doesn't have the digit check on the first character, though we can tighten things up if we decide field keys shouldn't start with a digit either.
pkg/sql/opt/props/physical/plangram.go line 608 at r3 (raw file):
Previously, michae2 (Michael Erickson) wrote…
I just realized: it would also be good to check for whitespace here (maybe using
unicode.IsSpaceor something).
Good point, Done.
pkg/sql/opt/exec/explain/testdata/plangram line 46 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
I think it's going to be important to support HasConstraint and HasLimit (in a future PR) to distinguish between a full scan and a constrained scan.
Agreed, and already brewing one :)
pkg/sql/opt/exec/explain/testdata/plangram line 426 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Looks like we're missing table and index fields for ZZJ.
Good catch, fixed it.
pkg/sql/opt/exec/explain/testdata/plangram line 458 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Interesting that the gist has the subquery, but doesn't have the scalar expression in the values referencing the subquery... just barely not enough information to reconstruct the opt plan.
Hm, yeah... I wonder if we could match subqueries using their SQL and/or ID?
pkg/sql/opt/exec/explain/testdata/plangram line 521 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
We should add Explain to the transparent ops. (And any other operators that return false in
plangram.VisibleToPlanGram... oh, actually Explain should be the only one that survives execbuild.)Hmm, maybe we need some kind of assertion that
VisibleToPlanGramis true for every operator added to a PlanGram during building.
Done.
pkg/sql/opt/exec/explain/testdata/plangram line 675 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
typo: extra character here
Done.
pkg/sql/opt/exec/explain/testdata/plangram line 804 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
I think there should be a (With) parent wrapping the (WithScan)? We might need to add some special handling to the decompiler to follow buffer labels. (Could be a future PR.)
True. I'll leave a TODO for myself and address in the follow-up.
Add DecompileToPlanGram, which walks an explain.Node tree (produced
by decoding a plan gist) and emits a physical.PlanGram describing the
set of optimizer plans matching the gist's structure.
Each exec node maps to one or more optimizer operators — an
alternation when the gist can't disambiguate. Examples include Scan
vs PlaceholderScan, LookupJoin vs Lock, UnionAll vs
LocalityOptimizedSearch, and the choice among UNION/INTERSECT/EXCEPT
once the gist's ALL bit narrows it. Children of an alternation node
are promoted to their own productions so each node is visited
exactly once.
Two execbuilder shape transforms are inverted:
- Hash and merge joins for LeftSemi/LeftAnti are commuted to
RightSemi/RightAnti when the right input is the smaller side;
decompileChildren swaps the children back to optimizer order.
- simpleProjectOp is emitted both for real ProjectExprs and for
execbuilder wrappers (applyPresentation, buildGroupBy). The
decompiler emits the wrap-alt "(Project child) | child" so the
matcher can try both shapes.
The decompiler is expected to emit a structurally-valid PlanGram, so
any builder error indicates a bug. The error is wrapped as an
assertion failure and either panics (test builds) or is returned
(production).
A "plangram" datadriven file driven from TestExplainBuilder covers
the mapping; each case checks that the PlanGram derived from the
original explain tree matches the one derived from the round-tripped
gist.
Fixes cockroachdb#152061
Release note: None
|
/trunk merge |
|
😎 Merged successfully - details. |
This branch adds a plan gist decompiler: given a CockroachDB plan gist
(a compact encoding of a query plan), produce a
physical.PlanGram(regular tree grammar) describing the set of optimizer plans whose
structure matches the gist. The decompiled grammar can then be used to
pin physical plans.
The branch is structured as three commits, each independently
reviewable:
sql/opt/props/physical: indent expressions in FormatPretty—cosmetic prerequisite. Adds a
newlines=truemode forFormatPrettythat indents nested expressions and breaks multi-rule productions
onto their own lines, making the testdata in the third commit
readable.
sql/opt/props/physical: add PlanGramBuilder— a statefulEnter*/Leave*/AddField/Ref*/Build/ResetAPI forconstructing PlanGrams.
ParsePlanGramis rewired onto it so allconstruction invariants are validated in one place.
sql/opt/exec/explain: decompile plan gists into PlanGrams—the decompiler itself, plus a
plangramdatadriven file thatround-trips each test case (original explain tree vs gist-decoded
explain tree) and asserts the two derived PlanGrams are equal.
To make the grammar match the optimizer plan, the decompiler reverses
two execbuilder transformations: hash/merge joins commuted to
RightSemi/RightAntiget their inputs swapped back, andsimpleProjectwrappers are encoded as(Project child) | childsothe matcher can try both shapes. Other ambiguities the gist can't
resolve (e.g.
UnionAllvsLocalityOptimizedSearch, set opsconstrained only by the ALL bit) are encoded as alternations across
the surviving optimizer operators.
Fixes #152061
Release note: None